The indicator for PM2.5 is the annual mean concentration of PM2.5 (weighted average of measured monitor concentrations and satellite observations, in µg/m3 and measured from 2015 to 2017. The indicator for Asthma is spatially modeled, age-adjusted rate of ED visits for asthma per 10,000. This data is also from 2015 to 2017.

The maps show concentrations of high PM2.2 in areas centering around Oakland, and decreases as the radius increases. For example, areas north of Santa Rosa have some of the lowest PM2.5 values, while Oakland has the highest.

The maps show concentrations of high Asthma prevalence in areas such as Oakland, Vallejo, and just north-west of San Leandro. The concentrations of Asthma prevalence decrease as the radius increases from these points.

The best-fit line is not very fit at this stage. While it does center around the average, where many of the data points are, there are many data points that deviate significantly from this line.

## 
## Call:
## lm(formula = Asthma ~ PM2.5, data = .)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -54.47 -25.89  -9.61  12.94 182.95 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -116.278     13.040  -8.917   <2e-16 ***
## PM2.5         19.862      1.534  12.950   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37.49 on 1578 degrees of freedom
## Multiple R-squared:  0.09606,    Adjusted R-squared:  0.09549 
## F-statistic: 167.7 on 1 and 1578 DF,  p-value: < 2.2e-16

An increase of PM2.5 in x is associated with an increase of Asthma in y; 9.6% of the variation in y is explained by the variation in x.

The mean of the residual is close to zero but the residual distribution is skewed to the right (positive skew). Therefore, I will apply a log transformation to the model.

This fitness of this line to the data is much better than the previous plot. The best-fit line is centered well among the data points, reflecting a truer average.

## 
## Call:
## lm(formula = log(Asthma) ~ PM2.5, data = ces4_map)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.00402 -0.46479  0.03313  0.42298  1.75525 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.69234    0.22840   3.031  0.00248 ** 
## PM2.5        0.35633    0.02686  13.264  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6566 on 1578 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1003, Adjusted R-squared:  0.09974 
## F-statistic: 175.9 on 1 and 1578 DF,  p-value: < 2.2e-16

An increase of PM2.5 in x is associated with an increase of Asthma in log(y); After applying the log transform, 99.7% of the variation in y is explained by the variation in x.

The map showing log residuals shows an over correlation of PM2.5 and asthma in areas around Vallejo and Antioch, as well as in between Alameda and San Leandro. Conversely, this map shows an undercorrelation between PM2.5 and asthma in areas around Stanford (the most negative residual), with lesser degrees in Cupertino and Menlo Park. Some areas just outside Berkeley are also undercorrelated. Stanford’s lowest residual might be due to the fact that students are often not here for a long period of time, thus might not have the time to develop asthma from PM2.5 in the short time they are living on campus.

Now plotting just the 20% over and undercorrelated, we can see these areas even more clearly.